Linear Models

Model	Mathematical Expression	Description
Ordinary Least Squares	$\min_{w} \\|Xw - y\\|_2^2$	Fits a linear model with coefficients to minimize the residual sum of squares between observed and predicted targets. Sensitive to outliers; not robust if features are correlated (multicollinearity).
Ridge Regression	$\min_{w} \\|Xw - y\\|_2^2 + \alpha \\|w\\|_2^2$	Adds L2 regularization to the model to address some of the problems of Ordinary Least Squares. More robust to multicollinearity; has a bias-variance trade-off controlled by $\alpha$ .
Lasso Regression	$\min_{w} \frac{1}{2n_{\text{samples}}} \\|Xw - y\\|_2^2 + \alpha \\|w\\|_1$	Adds L1 regularization to enforce sparsity of the coefficient vector. Useful for feature selection; produces models with fewer coefficients.
Elastic Net	$\min_{w} \frac{1}{2n_{\text{samples}}} \\|Xw - y\\|_2^2 + \alpha \rho \\|w\\|_1 + \frac{\alpha (1-\rho)}{2} \\|w\\|_2^2$	Combines L1 and L2 regularization to control the complexity of the model with two parameters. Balances between Ridge and Lasso; useful when there are correlations among features.
Logistic Regression	$\min_{w, c} \sum_{i=1}^{n} \log (1 + \exp (-y_i (X_i^T w + c)))$	Used for binary classification problems, estimates probabilities using a logistic function. Provides probabilistic interpretation for binary classification tasks.
Polynomial Regression	Depends on the degree of the polynomial features created from $X$ .	Extends linear models by adding polynomial terms, which allows fitting a broader range of data. Can fit non-linear patterns; beware of overfitting with high-degree polynomials.
RidgeCV	Same as Ridge, with $\alpha$ optimized by CV.	Ridge regression with built-in cross-validation of the alpha parameter to determine the best regularization. Convenient for automating the choice of $\alpha$ .
LassoCV	Same as Lasso, with $\alpha$ optimized by CV.	Lasso regression with built-in cross-validation for selecting the best value of $\alpha$ . Efficient for high-dimensional data; automates $\alpha$ selection.